Backwards Conversion, Solving for PSA Parameters and Copula-Based PSA Sampling
source("./manifest.r")
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.1 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
here() starts at /Users/johngraves/Dropbox/teaching/SMDM-Europe-2023
Loading required package: forecast
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Registered S3 methods overwritten by 'demography':
method from
print.lca e1071
summary.lca e1071
This is demography 2.0.0.9000
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
Attaching package: 'expm'
The following object is masked from 'package:Matrix':
expm
Learning Objectives
Back-convert an existing transition probability matrix to incorporate new health states, strategies, and evidence.
Solve for probabilisitic sensitivity analysis (PSA) distribution parameters.
Sample correlated PSA distributions using copulas.
1. Backwards Conversion
Lectures 2 and 3 emphasized the importance of the transition rate matrix as the central “hub” of a Markov model.
Goal
What Rarely Happens
Adapting a Model Requires a Transition Rate Matrix
1. Backwards Conversion
1. Backwards Conversion
If starting a model from scratch, can simply define or estimate rates, and use them to construct the rate matrix.
1. Backwards Conversion
What if we have a model that is already defined in terms of a transition probability matrix?
How can we convert back to a rate matrix to add new health states, transition states, accumulators, etc. as needed?
Also useful if we want to keep everything the same, but change the model time step (e.g., see Chhatwal, Jayasuriya, and Elbasha (2016))
1. Backwards Conversion
We will next show you several ways to work backwards.
Boils down to solving for the continuous “generator matrix” for the observed transition probability matrix.
A Word of Warning
As you’ll see, it’s not always going to work perfectly.
If the original transition probability matrix was defined incorrectly (e.g., no jumpover probabilities), the generator matrix may not exist.
The negative rate implies an identifiability issue.
Note rates from HealthyAIDS, HealthyDeath are relatively small, implying these are from statistical noise in observation.
A. Matrix Logarithm of P
expm::logmHigham (2008) method returns same as eigenvalue method.
Tweaking rates to get a proper model is ad-hoc and difficult.
A. Matrix Logarithm of P
Higham (2008) method returns same as eigenvalue method. - expm::logm()
Tweaking rates to get a proper model is ad-hoc and difficult.
R = expm::logm(mP)
R <-logm(mP)dimnames(R) =dimnames(mP)round(R,3)
Healthy LowCD4 AIDS Death
Healthy -0.327 0.311 0.002 0.013
LowCD4 0.000 -0.543 0.615 -0.072
AIDS 0.000 0.000 -0.288 0.288
Death 0.000 0.000 0.000 0.000
B. Multi-State Model Based Approach
Imposition of structural assumptions and fitting to pseudo-data derived from original Markov model can result in reasonable rate estimates.
We could assume that a patient with HIV at this point in time only gets sicker and external causes of death are negligible. Healthy LowCD4 AIDS Death constrains model.
We use the reported data from the original model at year 1.
B. Multi-State Model Based Approach
Steps:
Run the original model for 1+ cycles to obtain the Markov trace.
Construct psuedo-data from the resulting trace based on a cohort with reasonable size (e.g., 1,000 patients)
Estimate transition hazards in the pseudo-data based on a multistate model.1
Use the estimated transition hazards to construct the rate matrix.
1. Run the original model
tr0 <-# Initial state occupancy.c("Healthy"=1000,"LowCD4"=0, "AIDS"=0, "Death"=0)tr1 <-# State occupancy after one cycle. tr0 %*% mP tr <-# Bind together into a data frame. rbind.data.frame(tr0, tr1) %>%mutate(cycle =c(0, 1)) %>%select(cycle,everything())tr
cycle Healthy LowCD4 AIDS Death
1 0 1000 0 0 0
2 1 721 202 67 10
2. Construct psuedo-data
Idea: create data for 1,000 simulated “patients” followed for two cycles.
Counts in the Markov trace govern state occupancy in each cycle.
In first cycle, all 1,000 are in Healthy state.
In second cycle, 721 remain in Healthy while 202 are in LowCD4, etc.
Resulting expectation after 1 cycle is nearly identical.
The msm method has bits down at 4th decimal. Round off made them identical.
Only a tiny bit of error (721 vs. 721.0005) makes log numerically unstable converting from probability to rate.
Key Takeaways
By imposing some constraints on the underlying transitions, we were able to yield a generator matrix that makes sense!
Critically, our multistate model-based generator matrix closely approximates the original transition probability matrix, but does not imply negative rates that bring people back from death.
Healthy LowCD4 AIDS Death
Healthy -0.327 0.327 0.000 0.000
LowCD4 0.000 -0.645 0.635 0.010
AIDS 0.000 0.000 -0.348 0.348
Death 0.000 0.000 0.000 0.000
Key Takeaways
With a reasonable generator matrix defined, we can now augment the model as we see fit:
Add additional health states.
Add new strategies with evidence on efficiacy drawn from the literature (e.g., hazard rates).
Add accumulators and transition states.
Change the cycle length.
Once done with the above, just embed the new matrix.
2. Solving for PSA Distributions
A Common Issue
We have defined our underlying model (either from bottom up or through backwards conversion) but need to define PSA distributions for model parameters.
Model draws on literature-based parameters that are reported as means, standard deviations, interquartile ranges, 95% confidence intervals, etc.
A Common Issue
Straightforward to obtain base case values from literature.
But how can we define PSA distributions based on limited information?
PSA Distributions
Parameter Type
Distribution
Probability
beta
Rate
gamma
Utility weight
beta
Right skew (e.g., cost)
gamma, lognormal
Relative risks or hazard ratios
lognormal
Odds Ratio
logistic
Example
Cost parameter reported in literature has interquartile range of $300-$750.
What are the parameters of a PSA distribution such that the 25th percentile is $300 and the 75th percentile is $750?
Some Options
Analytic formulas for some distributions (normal, gamma, beta).
Formulas take as their inputs the two values (\(x_1\), \(x_2\)) and their associated quantiles (\(p_1\),\(p_2\)).
Formulas return the PSA distribution parameters that match up to these values.
# Transfinite scalingTf <-function(x, shape, rate)pgamma(x,shape,rate,log=TRUE) -pgamma(x,shape,rate,log=TRUE, lower.tail =FALSE)# Function to minimizenorm <-function(x1, p1, x2, p2, shape,rate) (Tf(x1, shape, rate)-(log(p1)-log(1-p1)) )^2+ (Tf(x2, shape, rate)-(log(p2)-log(1-p2)) )^2# Bundle it all together into a single function.fn <-function(x) norm(x1, p1, x2, p2, x[1], x[2])
Transfinite Example, Part 2
# Run general-purpose optimization on the function.gamma_optim <-optim(c(0.5, 0.1), # initial parameter guesses fn,gr =function(x) pracma::grad(fn, x),lower=c(-Inf, 1e-5),method ="L-BFGS-B",control=list(factr=1e-10, maxit=100))$par# Note: gamma_optim$par[2] is 1/betagamma_optim
[1] 2.4558556 0.0043449
Transfinite Example
Optimization returns \(\alpha\) and \(1/\beta\) for the gamma distribution example.
# Analytic formulaalpha_
[1] 2.4558
# Optimizationgamma_optim[[1]]
[1] 2.4559
# Analytic formulabeta_
[1] 230.16
# Optimization1/gamma_optim[[2]]
[1] 230.16
Optimization for Other Distributions
Just swap in the distribution function for the distribution you want to match to (e.g., pbeta rather than pgamma).
Also make sure the supplied parameter names match the distribution you’re aiming for.
Optimization for Other Distributions
Just swap in the distribution function for the distribution you want to match to (e.g., pbeta rather than pgamma).
Also make sure the supplied parameter names match the distribution you’re aiming for.
# 1. Define the correlation matrix rho <-0.8 sigma <-matrix(c(1,rho,rho,1),nrow =2, byrow=TRUE)# 2. Perform a cholesky decomposition A =chol(sigma)# 3. Generate iid standard normal pseudo random variables tildeY0 <-rnorm(n =1e3, mean =0, sd =1) tildeY1 <-rnorm(n =1e3, mean =0, sd =1) tildeY <-cbind(tildeY0,tildeY1)# Collect them tildeY <- tildeY %*% A# Use the standard normal disribution function to return the quantiles U <-pnorm(tildeY)# Use the inverse distribution function to return the values Y <- U cx1 <-qgamma(U[,1],shape =2.5, rate =1/230.16) cx2 <-qgamma(U[,2],shape =1.5, rate =1/150)tibble(x1 = cx1, x2 = cx2) %>%ggplot(aes(x = x2, y = x1)) +geom_point() + hrbrthemes::theme_ipsum_pub(base_family ="Arial") +labs(x ="Direct Medical Costs", y ="Community Costs")
What We’ll Teach You
Red points: independent PSA draws
Black points: correlated PSA draws
tibble(x1 = x1, x2 = x2) %>%ggplot(aes(x = x2, y = x1)) +geom_point(alpha =0.5, colour ="red") +geom_point(data =tibble(x1 = cx1, x2 = cx2) , aes(x = cx2, y = cx1),alpha =0.25) + hrbrthemes::theme_ipsum_pub(base_family ="Arial") +labs(x ="Direct Medical Costs", y ="Community Costs") +stat_ellipse(data = MASS::mvrnorm(n=1e3, mu =c(1500,750),Sigma =matrix(c(40000,0,0,20000),byrow=TRUE,nrow=2)) %>%data.frame() %>%set_names(c("x1","x2")) , lty =3, colour ="black" ) +annotate("text", x =750, y =1500, label ="Traditional PSA draws (red points)\n very sparse in this region.",fontface =2, color ="red")
Copula-Based PSA Sampling
We can easily draw correlated PSA samples via copulas.
Copulas allow us to sample from the joint distribution of correlated parameters.
Copulas
Copula: a multivariate cumulative distribution function with uniform marginals.
Intuition: the binding “glue” between random variables.
Sklar (1959)
Sklar (1959) demonstrates that the joint distribution of two random variables can be represented as
Sklar (1959)
Sklar (1959) demonstrates that the joint distribution of two random variables can be represented as
Define PSA distribution parameters (e.g., \(\mu,\sigma\) if normal, etc.)
Draw a uniform random number between 0 and 1.
Feed this number through the quantile function for the specified distribution (e.g., qnorm(), qgamma(), etc.). This returns the value that maps to the randomly drawn quantile.
Do this for as many PSA samples as you need!
Copula-Based Sampling
Sampling from a copula adds one more step to this process.
We must also define a correlation matrix Sigma (\(\Sigma\)).
We then draw uniform random numbers from a multivariate uniform distribution.
Once drawn, we can feed each PSA variable back through its repective inverse CDF to obtain the final PSA sample.
Copula-Based Sampling
Define PSA distributions for each of \(n\) uncertain parameters.
Define an \(n \times n\) correlation matrix \(\Sigma\) for the joint distribution of the PSA parameters.
Draw an \(n\)-column multivariate uniform sample.
Apply the inverse transform method (as necessary) to each column.
# Alternative Version: Cholesky decomposition# Perform a cholesky decomposition A =chol(sigma)# Generate iid standard normal pseudo random variables tildeY0 <-rnorm(n = PSA_sample_size, mean =0, sd =1) tildeY1 <-rnorm(n = PSA_sample_size, mean =0, sd =1) tildeY <-cbind(tildeY0,tildeY1)# Collect them tildeY <- tildeY %*% A# Use the standard normal disribution function to return the quantiles U <-pnorm(tildeY)
4. Inverse Transform
# Inverse transform using parameters solved for earliercost_cc <-qlnorm(U[,1], mu_cc, sigma_cc)cost_dc <-qgamma(U[,2], shape_dc, rate_dc)# Make sure the marginal distributions check outquantile(cost_cc,c(0.25,0.75))
25% 75%
750.27 1249.99
quantile(cost_dc,c(0.25,0.75))
25% 75%
999.21 2997.30
# Check that correlation is therecor(cost_cc,cost_dc)
[1] 0.78332
Your Turn!
References
Chhatwal, Jagpreet, Suren Jayasuriya, and Elamin H. Elbasha. 2016. “Changing Cycle Lengths in State-Transition Models: Challenges and Solutions.”Medical Decision Making 36 (8): 952–64. https://doi.org/10.1177/0272989X16656165.
Cook, John D. 2010. “Determining Distribution Parameters from Quantiles.”Working paper.
Higham, Nicholas J. 2008. Functions of Matrices: Theory and Computation. SIAM.